Estimation of vocal-tract shape from speech spectrum and speech resynthesis based on a generative model
نویسنده
چکیده
Precise control of articulatory parameters is difficult and prevents a physical model from generating natural sounding speech signals. To determine vocal-tract shape from speech, this paper presents an inversion method for simultaneously estimating the cross-sectional area and length of the vocal tract. In addition, we performed speech resynthesis from a time-series of estimated vocal-tract shapes. The vocal-tract shape is determined through an iterative procedure that gradually optimizes the parameter values to produce the target speech spectrum. The vocal-tract shape is updated using a sensitivity function that represents the change in formant frequency caused by a small perturbation of the vocal-tract shape. When combined with a perturbation relationship of speech spectrum parameters (i.e., cepstrum parameters) and formants, our method effectively optimizes the vocal-tract shape. We quantitatively examined the accuracy using area function data for 10 isolated vowels. The results showed that the average area error was 0.43 cm and the average length error was 0.23 cm. This indicates that the vocaltract shape was determined with satisfactory accuracy. We also performed an estimation experiment for continuous speech and synthesized speech from the estimated vocal-tract shape.
منابع مشابه
Evaluation of a Speech Bandwidth Extension Algorithm Based on Vocal Tract Shape Estimation
In this paper, we evaluate a speech bandwidth extension (BWE) algorithm which involves phonetic and speaker dependent estimation of the high-band part of the spectral envelope. The BWE algorithm extracts speech phoneme information by using a hidden Markov model. Speaker vocal tract shape information corresponding to the wideband signal is extracted by a codebook search. Postprocessing of the es...
متن کاملArtificial Bandwidth Extension of Band Limited Speech Based on Vocal Tract Shape Estimation
This research addresses the challenge of improving degraded telephone narrowband speech quality caused by signal band limitation to the range of 0.3 3.4 kHz. We introduce a new speech bandwidth extension (BWE) algorithm which estimates and produces the high-band spectral components ranging from 3.4 kHz to 7 kHz, and emphasizes the lower spectral components around 300 Hz. Using a speech producti...
متن کاملMethods Used For Vocal Tract Shape Estimation and There Applicability for Children
Vocal tract shape estimation is essential for development of speech training aids for hearing impaired. Speech therapists have concluded from their research that if speech disorders are detected at an early stage then the rectification of those becomes faster and to a great accuracy. In case of adults, the vocal tract shape estimation can be performed by using techniques like X-Ray, MRI or usin...
متن کاملValidation of Optimum Algorithm Parameters Required to Estimate Vocal Tract Shape for Children Using LPC Analysis
Severe or profound deafness in hearing impaired children, can curb their ability to speak due to the lack of auditory feedback. There has been a considerable attempt in developing commercial speech training aids for such children which give feedback of acoustic and articulatory parameters. Speech training aids based on visual feedback of vocal tract shape (VTS) are reported to be useful for the...
متن کاملPhase equalization-based autoregressive model of speech signals
This paper presents a novel method for estimating a vocal-tract spectrum from speech signals, based on a modeling of excitation signals of voiced speech. A formulation of linear prediction coding with impulse train is derived and applied to the phaseequalized speech signals, which are converted from the original speech signals by phase equalization. Preliminary results show that the proposed me...
متن کامل